Comparative Analysis of N-gram Text Representation on Igbo Text Document Similarity
نویسندگان
چکیده
منابع مشابه
Text Retrieval from Document Images based on N-Gram Algorithm
In this paper, we propose a method of text retrieval from document images using a similarity measure based on an N-Gram algorithm. We directly extract image features instead of using optical character recognition. Character image objects are extracted from document images based on connected components first and then an unsupervised classifier is used to classify these objects. All objects are e...
متن کاملn-Gram-Based Text Compression
We propose an efficient method for compressing Vietnamese text using n-gram dictionaries. It has a significant compression ratio in comparison with those of state-of-the-art methods on the same dataset. Given a text, first, the proposed method splits it into n-grams and then encodes them based on n-gram dictionaries. In the encoding phase, we use a sliding window with a size that ranges from bi...
متن کاملN-gram-based Text Attribution
Quantitative authorship attribution refers to the task of identifying the author of a text based on measurable features of the author’s style—a problem that has practical application in areas as diverse as literary scholarship, plagiarism detection, and criminal forensics. Attribution methods generally follow a generative approach, wherein a statistical “profile” is created for a set of candida...
متن کاملN-Gram-Based Text Categorization
Text categorization is a fundamental task in document processing, allowing the automated handling of enormous streams of documents in electronic form. One difficulty in handling some classes of documents is the presence of different kinds of textual errors, such as spelling and grammatical errors in email, and character recognition errors in documents that come through OCR. Text categorization ...
متن کاملSimilarity Measures for Text Document Clustering
Clustering is a useful technique that organizes a large quantity of unordered text documents into a small number of meaningful and coherent clusters, thereby providing a basis for intuitive and informative navigation and browsing mechanisms. Partitional clustering algorithms have been recognized to be more suitable as opposed to the hierarchical clustering schemes for processing large datasets....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Applied Information Systems
سال: 2017
ISSN: 2249-0868
DOI: 10.5120/ijais2017451724